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STREAM EDITOR 
1. Introduction 


The stream editor (sed) is a noninteractive context editor that runs on the UNIX operating system. The 
sed software is designed to be especially useful in the following cases: 


e When editing files too large for comfortable interactive editing 


e When editing any size file when the sequence of editing commands is too complicated to be comfortably 
typed in interactive mode 


e When performing multiple global editing functions efficiently in one pass through the input file. 

Because only a few lines of the input file reside in memory at one time and no temporary files are used, the 
effective size of a file that can be edited is limited only by the requirement that the input and output files fit 
simultaneously into available secondary storage. 

Complicated editing scripts can be created separately and given to the sed program as a command file. For 
complex edits, this saves considerable typing and attendant errors. The sed program running from a command 
file is much more efficient than an interactive editor even if that editor can be driven by a prewritten script. 

The principal loss of functions, if compared to an interactive editor, are lack of relative addressing (because 
of the line-at-a-time operation) and the lack of immediate verification that a command has done what was in- 
tended. 


The sed program is a lineal descendant of the text editor, ed. Because of the differences between interactive 
and noninteractive operations, considerable changes have been made between ed and sed. 


2. Overall Operation 

The sed program by default copies the standard input to the standard output, perhaps performing one or 
more editing commands on each line before writing it to the output. This behavior may be modified by flags 
on the command line. (See Command Line Flags, paragraph 2.1.) 

The general format of an editing command is 

[address1,address2] function [arguments] 

One or both addresses may be omitted. Any number of blanks or tabs may separate the addresses from the func- 
tion. The function must be present. Arguments may be required or optional according to the function given. Tab 
characters and spaces at the beginning of lines are ignored. 
2.1 Command Line Flags 


Three flags are recognized on the command line: 


at tells the sed program not to copy all lines, but only those specified by p (print) functions or 
p flags after s (substitute) functions 


ae tells the sed program to take the next argument as an editing command 


=) tells the sed program to take the next argument as a file name; the file should contain editing 
commands—one to a line. 


Page 41 


DOCUMENT PROCESSING GUIDE ISSUE 1 6/82 


2.2 Order of Application of Editing Commands 


Before any input file is opened, all editing commands are compiled into a form which will be moderately 
efficient during the execution phase (when the commands are actually applied to lines of the input file). 


e Commands are compiled in the order encountered; generally, the order they will be attempted at execu- 
tion time. 


e Commands are applied one at a time; the input to each command is the output of all preceding com- 
mands. 


The default linear order of application of editing commands can be changed by the t (test substitution) and 
b (branch) flow-of-control commands. When the order of application is changed by these commands, it remains 
true that the input line to any command is the output of any previously applied command. 


2.3 Pattern Space 


The range of pattern matches is called the pattern space. Ordinarily, the pattern space is one line of the 
input text, but more than one line can be read into the pattern space by using the next command (N). 


2.4 Examples 


Examples scattered throughout the following paragraphs use the following standard input text, except 
where noted: 


In Xanadu did Kubla Khan 

A stately pleasure dome decree: 
Where Alph, the sacred river, ran 
Through caverns measureless to man 
Down to a sunless sea. 


The command 
2q 
will copy the first two lines of the input and quit. The output will be 


In Xanadu did Kubla Khan 
A stately pleasure dome decree: 


3. Selecting Lines for Editing 


Input file lines that editing commands are to be applied to can be selected by addresses. Addresses may be 
either line numbers or context addresses. 


The application of a group of commands can be controlled by one address (or address pair) by grouping com- 
mands with curly braces ({ }). 


3.1 Line Number Addresses 


A line number is a decimal integer. As each line is read from the input, a line number counter is increment- 
ed. A line number address matches (selects) the input line causing the internal counter to equal the address 
line number. The counter runs cumulatively through multiple input files. It is not reset when a new input file 
is opened. As a special case, the $ character matches the last line of the last input file. 
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3.2 Context Addresses 


ee A context address is a pattern (a regular expression) enclosed in slashes (/.../). Regular expressions recog- 
nized by the sed program are constructed as follows: 

| e Anordinary character is a regular expression and matches that character. 

5 

: e A circumflex (*) at the beginning of a regular expression matches the null characterat the beginning 

; of a line. 


e A dollar sign ($) at the end of a regular expression matches the null character at the end of a line. 


e The characters (\n) match an embedded newline character but not the newline character at the end 
of the pattern space. : 


, € e A period (.), sometimes called dot, matches any character except the terminal newline character of the 
pattern space. 


e A regular expression followed by an asterisk (*) matches any number (including 0) of adjacent occur- 
rences of the regular expression it follows. 


7 

e A string of characters in square brackets ({ |) matches any character in the string and no others. If, how- 
| ever, the first character of the string is a circumflex (-), the regular expression matches any character 
except the characters in the string and the terminal newline character of the pattern space. The circum- 
) flex is the only metacharacter recognized within the square brackets. If ] needs to be in the set of square 
brackets, it should be the first nonmetacharacter. For example: 


= ia ee Includes ] 
[*] 


x Does not include ]} 


e A concatenation of regular expressions is a regular expression which matches the concatenation of 
strings matched by the components of the regular expression. 


e A regular expression between the sequences \( and \) is identical in effect to the unadorned regular 
expression but has side effects which are described under the s command (substitute function) below. 


e The expression \d means the same string of characters matched by an expression enclosed in \( and 
\) earlier in the same pattern. The dis a single digit; the string specified is that beginning with occur- 
rence d of \( counting from the left. For example, the following expression matches a line beginning 
aa with two repeated occurrences of the same string: 


VAY 


e The null regular expression standing alone (e.g., //) is equivalent to the last regular expression com- 
piled. 


To use one of the special characters (* $. * [ ] \ /) as a literal character (to match an occurrence of itself 
& in the input), the special character is preceded by a backslash (\). 


For a context address to match, the input requires that the whole pattern within the address match some 
portion of the pattern space. 


hi, Bi iy LOA ee GS Me Me om een ™ A 
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3.3 Number of Addresses 


Commands in the following paragraphs can have 0, 1, or 2 addresses. Under each command, the maximum 
number of allowed addresses is given. For a command to have more addresses than the maximum allowed is 
considered an error. 


If a command has no addresses, it is applied to every line in the input. 

If a command has one address, it is applied to all lines which match that address. 

If a command has two addresses, it is applied to the first line which matches the first address and to all 
subsequent lines until (and including) the first subsequent line which matches the second address. An attempt 


is made on subsequent lines to again match the first address, and the process is repeated. Two addresses are 
separated by a comma. Some examples are: 


/an/ matches lines 1, 3, and 4 in the sample text 
/an.*an/ matches line 1 

/*an/ matches no lines 

fy matches all lines 

/\./ matches line 5 

/r*an/ matches lines 1, 3, and 4 (number = 0) 


/\(an\).*\1/ matches line 1. 
4. Functions 


Functions are named by a single alphabetic character. In the following function summaries, the maximum 
number of allowable addresses is enclosed in parentheses, followed by the single character function name. Possi- 
ble arguments are enclosed in angle brackets (< >), and a description of each function is given. Angle brackets 
around arguments are not part of the argument and should not be typed in actual editing commands. 


4.1 Whole Line Oriented Functions Summary 


(2)d The d function deletes from the file (does not write to the output) those lines matched by 
its addresses. It also has the side effect that no further commands are attempted on the 
corpse of a deleted line. As soon as the d function is executed, a new line is read from the 
input, and the list of editing commands is restarted from the beginning on the new line. 


(2)n The n function reads the next line from the input, replacing the current line, and the cur- 
rent line is written to the output. The list of editing commands is continued following the 
n command. 

(1)a\ 

<text> The a function causes the argument <text> to be written to the output after the line 


matched by its address. The a command is inherently multiline; a must appear rt the end 
of a line, and <text> may contain any number of lines. To preserve the one-command-to-a- 
line fiction, interior newline characters must be hidden by a backslash character (\) imme- 
diately preceding the newline character. The <text> is terminated by the first unhidden 
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newline character not immediately preceded by a backslash. Once an a function is success- 
fully executed, <text> will be written to the output regardless of what later commands 
do to the line which triggered it. Even if that line is deleted, <text > will still be written 
to the output. The <text> is not scanned for address matches, and no editing commands 
are attempted on it. The a function does not cause a change in the line number counter. 


The i function behaves identically to the a function except that <text> is written to the 
output before the matched line. All other comments about the a function apply to the i 
function. 


The e function deletes lines selected by its addresses and replaces them with the lines in 
<text>. Like a and i, ¢ must be followed by a newline character hidden by a backslash; 
interior newline characters in <text> must be hidden by backslashes. The e command may 
have two addresses, and therefore select a range of lines. If it does, all lines in the range 
are deleted, but only one copy of <text> is written to the output, not one copy per line delet- 
ed. As with a and i, <text> is not scanned for address matches, and no editing commands 
are attempted on it. It does not change the line number counter. After a line has been de- 
leted by ac function, no further commands are attempted on the corpse. If text is appended 
after a line by a or r functions and the line is subsequently changed, the text inserted by 
the ¢ function will be placed before the text of the a or r functions (the r function is de- 
scribed later). 


For text put in the output by these functions, leading blanks and tabs will disappear as in sed commands. 
To get leading blanks and tabs into the output, the first desired blank or tab is preceded by a backslash. The 
backslash will not appear in the output. The list of editing commands for example: 


n 


a\ 
XXXX 
d 


applied to the standard input produces 


In Xanadu did Kubla Khan 


XXXX 


Where Alph, the sacred river, ran 


XXXX 


Down to a sunless sea. 


In this particular case, the same effect would be produced by either of the two following command lists: 


or 
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n 
c\ 
XXXX 


4.2 Substitute Function 


One important substitute function that changes parts of lines selected by a context search within the line 
is 


(2)s<pattern> <replacement> <flags> 

The s function replaces the part of a line selected by <pattern> with <replacement>. It can be read 
Substitute for <pattern>, <replacement> 

4.2.1 Pattern 


The <pattern> argument contains a pattern exactly like the patterns in addresses. The only difference be- 
tween <pattern> and a context address is that the context address must be delimited by slash (/) characters; 
<pattern> may be delimited by any character other than space or newline. By default, only the first string 
matched by <pattern> is replaced unless the g flag (below) is invoked. 


4.2.2 Replacement 


The <replacement> argument begins immediately after the second delimiting character of <pattern> and 
must be followed immediately by another instance of the delimiting character (thus there are exactly three in- 
stances of the delimiting character). The <replacement> is not a pattern, and the characters which are special 

-in patterns do not have special meaning in <replacement>. Instead, other characters are special: 


\& is replaced by the string matched by <pattern>. 


\d is replaced by substring d (dis a single digit), matched by parts of <pattern>, and enclosed 
in \( and \). If nested substrings occur in <pattern>, substring d is determined by count- 
ing opening delimiters (\(). As in patterns, special characters may be made literal charac- 
ters by preceding them with backslash (\). 


4.2.3 Flags 
The <flags> argument may contain the following: 


g Substitute <replacement> for all nonoverlapping instances of <pattern> in the line. After 
a successful substitution, the scan for the next instance of <pattern> begins just after the 
end of the inserted characters. Characters put into the line from <replacement> are not 
rescanned. 


p Print the line if a successful replacement was done. The p flag causes the line to be written 
to the output if and only if a substitution was actually made by the s function. If several 
s functions, each followed by a p flag, successfully substitute in the same input line, multi- 
ple copies of the line will be written to the output—one for each successful substitution. 


w <filename> Write the line to a file if a successful replacement was done. The w flag causes lines which 
are actually substituted by the s function to be written to a file named by <filename>. If 
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<filename> exists before sed is run, it is overwritten; if not, itis created. A single space 
must separate w and <filename>. The possibilities of multiple, somewhat different copies 
of one input line being written are the same as for p. A maximum of ten different file 
names may be mentioned after w flags and w functions. 


4.2.4 Examples 
The command 

s/to/by/w changes 

applied to the standard input produces on the output 
In Xanadu did Kubla Khan 
A stately pleasure dome decree: 
Where Alph, the sacred river, ran 
Through caverns measureless by man 
Down by a sunless sea. 


and on the file changes 


Through caverns measureless by man 
Down by a sunless sea. 


If the no-copy option is in effect, the command 
s/[.,32:]/*P&*/gp 

produces 
A stately pleasure dome decree*P:* 
Where Alph*P,* the sacred river*P,* ran 
Down to a sunless sea*P.* 

To illustrate the effect of the g flag, the command 
/X/s/an/AN/p 

produces (assuming no-copy mode) 
In XANadu did Kubla Khan 

and the command 
/X/s/an/AN/gp 

produces 


In XANadu did Kubla KhAN 


4.3 Input/Output Functions 


(2)p The print function writes addressed lines to the standard output file. They are written at 
the time the p function is encountered regardless of what succeeding editing commands 


may do to the lines. 
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(2)w <filename> The write function writes addressed lines to the file named by <filename~. If the file pre- 
viously existed, it is overwritten; if not, it is created. The lines are written exactly as they 
exist when the write function is encountered for each line regardless of what subsequent 
editing commands may do to them. Exactly one space must separate the w and 
<filename>. A maximum of ten different files may be mentioned in write functions and 
w flags after s functions combined. 


(1)r <filename> The read function reads the contents of <filename> and appends them after the line 
matched by the address. The file is read and appended regardless of what subsequent 
editing commands do to the line which matched its address. If r and a functions are exe- 
cuted on the same line, the text from a functions and r functions is written to the output 
in the order that the functions are executed. Exactly one space must separate the r and 
<filename~. If a file mentioned by an r function cannot be opened, it is considered a null 
file, not an error, and no diagnostic is given. 


Note: Since there is a limit to the number of files that can be opened simultaneously, care should be 
taken that no more than ten files be mentioned in w functions or flags. That number is reduced by one 
if any r functions are present (only one read file is opened at a time). 


If the file notel has the following contents 
Note: Kubla Khan (more properly Kublai Khan; 
1216-1294) was the grandson and most eminent 
successor of Genghiz (Chingiz) Khan and founder 


of the Mongol dynasty in China. 


then the command 


/Kubla/r notel 
produces 


In Xanadu did Kubla Khan 
Note: Kubla Khan (more properly Kublai Khan; 
1216-1294) was the grandson and most eminent 
successor of Genghiz (Chingiz) Khan and founder 
of the Mongol dynasty in China. 

A stately pleasure dome decree: 

Where Alph, the sacred river, ran 

Through caverns measureless to man 

Down to a sunless sea. 


4.4 Multiple Input Line Functions 


Three functions, all spelled with capital letters, deal with pattern spaces containing embedded newline char- 
acters. They are intended principally to provide pattern matches across lines in the input. 


(2)N The next input line is appended to the current line in the pattern space. The two input lines 
are separated by an embedded newline character. Pattern matches may extend across 
embedded newline characters. 
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(2)D Delete first part of the pattern space. Delete up to and including the first newline charac- 
ter in the current pattern space. If the pattern space becomes empty (the only newline 
character was the terminal newline character), read another line from the input. In any 
case, begin the list of editing commands again from the beginning. 


(2)P Print first part of the pattern space. Print up to and including the first newline character 


e in the pattern space. 


The P and D functions are equivalent to their lowercase counterparts if there are no embedded newline 
characters in the pattern space. 


4.5 Hold and Get Functions 


ae i i, CE, i, i Pt i Bi Mg rt sl 


Four functions save and retrieve part of the input for possible later use. 


e (2)h Hold pattern space.The h function copies contents of the pattern space into a hold area 
destroying previous contents. 


_2—  ' *\ &.@ de 


(2)H Hold pattern space. The H function appends contents of the pattern space to contents of 
the hold area. Former and new contents are separated by a newline character. 


(2)g Get contents of hold area. The g function copies contents of the hold area into the pattern 
space destroying previous contents. 


(2)G Get contents of hold area. The G function appends contents of the hold area to contents 
of the pattern space. Former and new contents are separated by a newline character. 


(2)x Exchange. The exchange command interchanges contents of the pattern space and the hold 


€ area. 


The following are examples: 


lh 
1s/did.*// 
lx 

G 
s/\n/_:/ 


when applied to the standard input text, produce 


In Xanadu did Kubla Khan :In Xanadu 

A stately pleasure dome decree: :In Xanadu 
e Where Alph, the sacred river, ran :In Xanadu 

Through caverns measureless to man :In Xanadu 

Down to a sunless sea. :In Xanadu 


4.6 Flow of Control Functions 


These functions do no editing on the input lines but control the application of functions to the lines selected 


ee by the address part. 


(2)! Don’t 
The don’t command causes the next command (written on the same line) to be applied to 
those input lines not selected by the address part. 
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(2) 


(0):<label> 


(2)b<label> 


(2)t<label> 


Grouping 

The grouping command causes the next set of commands to be applied (or not applied) as 
a block to the input lines selected by the addresses of the grouping command. The first of 
the commands under control of the grouping may appear on the same line as the { or on 
the next line. The group of commands is terminated by a matching } standing on a line by 
itself. Groups can be nested. 


Place a label 

The label function marks a place in the list of editing commands which may be referred 
to by b and t functions. The <label> may be any sequence of eight or fewer characters. 
If two different colon functions have identical labels, a compile time diagnostic will be gen- 
erated; and no execution attempted. 


Branch to label 

The branch function causes the sequence of editing commands being applied to the current 
input line to be restarted immediately after the place where a colon function with the same 
<label> was encountered. If no colon function with the same label can be found after all 
editing commands have been compiled, a compile time diagnostic is produced; and no exe- 
cution is attempted. A b function with no <label> is a branch to the end of the list of 
editing commands. Whatever should be done with the current input line is done, and an- 
other input line is read. The list of editing commands is restarted from the beginning on 
the new line. 


Test substitutions 

The t function tests whether any successful substitutions have been made on the current 
input line; if so, it branches to <label>; if not, it does nothing. The flag which indicates 
that a successful substitution has been executed is reset by reading a new input line and 
executing a t function. 


4.7 Miscellaneous Functions 


(1)= 


(1)q 


Page 50 


The = function writes to standard output the line number of the line matched by its ad- 
dress. 


The q function causes the current line to be written to the output (if it should be), any ap- 
pended or read text to be written, and execution to be terminated. 


A Bet & «26 *, 2m RP 5 ? 
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MISCELLANEOUS FACILITIES 


Several miscellaneous facilities exist (via UNIX operating system commands) to aid in the development of 
documentation. These facilities are easy to access and are very effective. Their use is beneficial in documentation 
development. Some available miscellaneous facilities are described briefly in the following list. The User’s 


Manual—UNIX Operating System has a more detailed description. 


bdiff 


cat 


cmp 


comm 


diff 


diff3 


diffmk 


grep 


pr 


sdiff 


sort 


The bdiff facility is used in a manner analogous to diff to find which lines must be 
changed in two files to bring them into agreement. Its purpose is to allow processing of 
files which are too large for diff. 


The cat facility reads each file in sequence and writes it on the standard output. Thus: 


cat file 
prints the file named file, and 
cat filel file2>file3 


concatenates filel and file2 and places the result in filed. 


The emp facility compares two files. Under default options, emp makes no comment if the 
files are the same; if they differ, it announces the byte and line number at which the differ- 
ence occurred, 


The comm facility selects or rejects lines common to two sorted files. It reads file? and 
file2 and produces a 3-column output as follows: lines only in filel, lines only in file2 and 
lines in both files. 


The diff facility is a differential file comparator. It tells what lines must be changed in 
two files to bring them into agreement. 


The diff3 facility is a 3-way differential file (files up to 64K) comparator. It compares 
three versions of a file and publishes disagreeing ranges of text flagged with special codes. 


The diffmk facility marks the differences between files. It compares two versions of a file 
and creates a third file that includes “change mark” commands for the nroff or troff 
formatter. 


Commands of the grep facility search the input files for lines matching a pattern. Normal- 
ly, each line found is copied to the standard output. The grep patterns are limited regular 
expressions in the style of ed. The egrep patterns are full regular expressions. The fgrep 
patterns are fixed strings. 


The pr facility prints the named files on the standard output. If file is — or if no files are 
specified, the standard input is assumed. 


The sdiff facility uses the output of diff(1) to produce a side-by-side listing of two files 
indicating those lines that are different. Each line of the two files are printed with a blank 
gutter between them if the lines are identical, a > in the gutter if the line exists only in 
filel, a.< in the gutter if the line exists only in file2 and a} for lines that are different. 


The sort facility sorts lines of all the named files together and writes the results on the 
standard output. 
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spell The spell facility collects words from the named files and looks them up in a spelling list. 
Words that do not occur in the spelling list nor can be derived from them are printed on 
the standard output. The spellin and spellout are two additional subroutines of spell. 


split The split facility splits a file into pieces. 


typo The typo facility searches through a document for unusual words, typographical errors, 
and hapax legomena and prints them on the standard output. 


uniq The unigq facility reports repeated lines in a file. It reads the input file comparing adjacent 
lines. In the normal case, the second and succeeding copies of repeated lines are removed; 
the remainder is written on the output file. 
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